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Full text available: ^ pdf(1.74 MB) Additional Information: full citation , abstract , references , citings 
Publisher Site 

This paper presents an algorithm for identifying the noun phrase antecedents of third person 
pronouns and lexical anaphors (reflexives and reciprocals). The algorithm applies to the 
syntactic representations generated by McCord's Slot Grammar parser and relies on salience 
measures derived from syntactic structure and a simple dynamic model of attentional state. 
Like the parser, the algorithm is implemented in Prolog. The authors have tested it 
extensively on computer manual texts and conducted a ... 
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Understanding distributed applications is a tedious and difficult task. Visualizations based on 
process-time diagrams are often used to obtain a better understanding of the execution of 
the application. The visualization tool we use is Poet, an event tracer developed at the 
University of Waterloo. However, these diagrams are often very complex and do not provide 
the user with the desired overview of the application. In our experience, such tools display 
repeated occurrences of non-trivial commun ... 
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Various strategies are proposed to identify and classify three types of proper nouns in 
Chinese texts. Clues from character, sentence and paragraph levels are employed to resolve 
Chinese personal names. Character, Syllable and Frequency Conditions are presented to 
treat transliterated personal names. To deal with organization names, keywords, prefix, 
word association and parts-of-speech are applied. For fair evaluation, large scale test data 
are selected from six sections of a newspaper. The pre ... 

5 Pen computing: a technology overview and a vision I 
Andre Meyer 

July 1995 ACM SIGCHI Bulletin, Volume 27 issue 3 

Full text available: ^ pdf(5.14 MB) Additional Information: full citation , abstract , citings, index terms 

This work gives an overview of a new technology that is attracting growing interest in public 
as well as in the computer industry itself. The visible difference from other technologies is in 
the use of a pen or pencil as the primary means of interaction between a user and a 
machine, picking up the familiar pen and paper interface metaphor. From this follows a set 
of consequences that will be analyzed and put into context with other emerging technologies 
and visions. Starting with a short historic ... 
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July 1993 Proceedings of the 16th annual international ACM SIGIR conference on 
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Full text available: * gpdf(1.02 MB) terms 

We argue that the advent of large volumes of full-length text, as opposed to short texts like 
abstracts and newswire, should be accompanied by corresponding new approaches to 
information access. Toward this end, we discuss the merits of imposing structure on full- 
length text documents; that is, a partition of the text into coherent multi-paragraph units 
that represent the pattern of subtopics that comprise the text. Using this structure, we can 
make a distinction between th ... 

7 Special issue on using large corpora: I: Text-translation alignment 
Martin Kay, Martin Roscheisen 

March 1993 Computational Linguistics, Volume 19 issue l 

Full text available. ^ pdf(1 2 p MB) f l Additional Information: full citation , abstract , references , citings 
Publisher Site 

We present an algorithm for aligning texts with their translations that is based only on 
internal evidence. The relaxation process rests pn a notion of which word in one text 
corresponds to which word in the other text that is essentially based on the similarity of 
their distributions. It exploits a partial alignment of the word level to induce a maximum 
likelihood alignment of the sentence level, which is in turn used, in the next iteration, to 
refine the word level estimate. The algorithm appe ... 
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Marti A. Hearst 

March 1997 Computational Linguistics, volume 23 issue 1 

Full text available, g pc jf(2.46 MB) H Additional Information: full citation , abstract , references , citings 
Publisher Site 

TextTiling is a technique for subdividing texts into multi-paragraph units that represent 
passages, or subtopics. The discourse cues for identifying major subtopic shifts are patterns 
of lexical co-occurrence and distribution. The algorithm is fully implemented and is shown to 
produce segmentation that corresponds well to human judgments of the subtopic 
boundaries of 12 texts. Multi-paragraph subtopic segmentation should be useful for many 
text analysis tasks, including information retrieval and ... 
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Full text available: TO pdf 3. 13 MB) _ 

terms 

Aspects of an intelligent interface that provides natural language access to a large body of 
data distributed over a computer network are described. The overall system architecture is 
presented, showing how a user is buffered from the actual database management systems 
(DBMSs) by three layers of insulating components. These layers operate in series to convert 
natural language queries into calls to DBMSs at remote sites. Attention is then focused on 
the first of the insulating components, th ... 
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run-time personalization, semantic grammar 
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August 1996 Proceedings of the 16th conference on Computational linguistics - Volume 
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In this paper, we propose an algorithm for aligning words with their translation in a bilingual 
corpus. Conventional algorithms are based on word-by-word models which require bilingual 
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data with hundreds of thousand sentences for training. By using a word-based approach, 
less frequent words or words with diverse translations generally do not have statistically 
significant evidence for confident alignment. Consequently, incomplete or incorrect 
alignments occur. Our algorithm attempts to handle th ... 
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This article reviews the available methods for automated identification of objects in digital 
images. The techniques are classified into groups according to the nature of the 
computational strategy used. Four classes are proposed: (1) the simplest strategies, which 
work on data appropriate for feature vector classification, (2) methods that match models to 
symbolic data structures for situations involving reliable data and complex models, (3) 
approaches that fit models to the photometry and ... 
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review 

An earlier paper developed a procedure for compressing concordances, assuming that all 
alements occurred independently. The models introduced in that paper are extended here to 
take the possiblity of clustering into account. The concordance is conceptualized as a set of 
bitmaps, in which the bit locations reporesent documents, and the one-bits represent the 
occurrence of given terms. Hidden Markov Models (HMM's) are used to describe the 
clustering of the one-bits. However, for computational ... 

Keywords: classification of graph nodes, concordance organization, concordance storage, 
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The need to model the relation between discourse structure and linguistic features of 
utterances is almost universally acknowledged in the literature on discourse. However, there 
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is only weak consensus on what the units of discourse structure are, or the criteria for 
recognizing and generating them. We present quantitative results of a two-part study using 
a corpus of spontaneous, narrative monologues. The first part of our paper presents a 
method for empirically validating multitutterance units ... 

18 Novelty and topic change: Domain-independent text segmentation using anisotropic Q 
diffusion and dynamic programming 
Xiang Ji, Hongyuan Zha 

July 2003 Proceedings of the 26th annual international ACM SIGIR conference on 
• Research and development in informaion retrieval 

Full text available: "g^ pdf(171.61 KB) Additional Information: full citation , abstract , references , index terms 

This paper presents a novel domain-independent text segmentation method, which identifies 
the boundaries of topic changes in long text documents and/or text streams. The method 
consists of three components: As a preprocessing step, we eliminate the document- 
dependent stop words as well as the generic stop words before the sentence similarity is 
computed. This step assists in the discrimination of the sentence semantic information. Then 
the cohesion information of sentences in a document o ... 

Keywords: anisotropic diffusion, document-dependent stop words, dynamic programming, 
text segmentation 
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20 The rhetorical parsing of unrestricted texts: a surface-based approach 
Daniel Marcu 

September 2000 Computational Linguistics, volume 26 issue 3 
Full text available: 



_ pdf(3.87 MB) W Additional Information: full citation , abstract , references 
Publisher Site 

Coherent texts are not just simple sequences of clauses and sentences, but rather complex 
artifacts that have highly elaborate rhetorical structure. This paper explores the extent to 
which well-formed rhetorical structures can be automatically derived by means of surface- 
form-based algorithms. These algorithms identify discourse usages of cue phrases and break 
sentences into clauses, hypothesize rhetorical relations that hold among textual units, and 
produce valid rhetorical structure trees for ... 
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